Change Llama2 from the Turbine implementation to the Sharktank one #2170

gpetters-amd · 2024-09-19T16:05:47Z

There are still two outstanding issues I'd like some comments on, but otherwise this should be basically done.

gpetters-amd · 2024-09-19T16:07:39Z

apps/shark_studio/api/llm.py

                )
+                # TODO: Convert to gguf, delete cache


The way that sharktank recommends for generating the .gguf file is to use a CLI tool from llama.cpp. Is that still the best way to extract that, or do we have a way to do it using sharktank?

gpetters-amd · 2024-09-19T16:08:31Z

apps/shark_studio/api/llm.py

+            model = PagedLlamaModelV1(dataset.root_theta, llama_config)
+
+            fxb = FxProgramsBuilder(model)
+            self.torch_ir = export(fxb)


Not sure why, but this is producing an empty module. Any idea what I'm missing?

Change Llama2 from the Turbine implementation to the Sharktank one

8359038

gpetters-amd commented Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Llama2 from the Turbine implementation to the Sharktank one #2170

Change Llama2 from the Turbine implementation to the Sharktank one #2170

gpetters-amd commented Sep 19, 2024

gpetters-amd Sep 19, 2024

gpetters-amd Sep 19, 2024

Change Llama2 from the Turbine implementation to the Sharktank one #2170

Are you sure you want to change the base?

Change Llama2 from the Turbine implementation to the Sharktank one #2170

Conversation

gpetters-amd commented Sep 19, 2024

gpetters-amd Sep 19, 2024

Choose a reason for hiding this comment

gpetters-amd Sep 19, 2024

Choose a reason for hiding this comment